Alignment by Bilingual Generation and Monolingual Derivation

نویسندگان

  • Toshiaki Nakazawa
  • Sadao Kurohashi
چکیده

One of the main issues in a word alignment task is the difficulty of handling function words that do not have direct translations which we call unique function words. They are often aligned to some words in the other language incorrectly. This is prominent in language pairs with very different sentence structures. In this paper, we propose a novel approach for handling unique function words. The proposed model monolingually derives unique function words from bilingually generated treelet pairs. The monolingual derivation prevents incorrect alignments for unique function words. The derivation probabilities are estimated from a large monolingual corpus, which is much easier to acquire than a parallel corpus. Also, the proposed alignment model uses semantic-head dependency trees where dependency relations between words become similar in each language. Experimental results on an English-Japanese corpus show that the proposed model achieves better alignment and translation quality compared with the baseline models. TITLE AND ABSTRACT IN JAPANESE 二言語の生成と単言語の派生によるアライメント 単語アライメントタスクにおける主な問題の一つは、機能語の中でも相手言語に対応する語 が存在しない機能語の扱いの困難さである。我々はこのような語を孤立機能語と呼ぶ。孤立 機能語は、相手言語の何らかの単語に不適切に対応付けられることが多く、これは特に文構 造が大きく異なる言語対において顕著である。本論文では、孤立機能語を扱うための新しい 手法を提案する。提案モデルは、二言語で生成された部分木ペアから、孤立機能語をそれぞ れ単言語で派生することにより、孤立機能語が誤って対応付けられることを防ぐ。派生確率 は、対訳コーパスに比べて入手が容易である大規模単言語コーパスから推定する。また提案 モデルは、単語同士の依存関係が各言語で近くなるように、意味主辞依存構造木を用いる。 英日コーパスでの実験結果から、提案モデルはベースラインモデルと比べてより良いアライ メントおよび翻訳精度を実現した。

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Description of KYOTO EBMT System in PatentMT at NTCIR-10

This paper describes“KYOTO”EBMT system that attended PatentMT at NTCIR-10. When translating very different language pairs such as Japanese-English, it is very important to handle sentences in tree structures to overcome the difference. Many of recent studies incorporate tree structures in some parts of translation process, but not all the way from model training (parallel sentence alignment) to...

متن کامل

Evaluation and comparison of cognitive flexibility, selective attention and response inhibition in male and female bilingual and monolingual students

In different parts of the world, people speak different languages ​​to each other. Some parts of the world are more linguistically rich and more than one language is spoken in those regions. The aim of this study was to evaluate and evaluate the executive functions of the brain including cognitive flexibility, selective attention and response inhibition in monolingual and bilingual male and fem...

متن کامل

The Use of Hedges and Boosters in Monolingual and Bilingual EFL Learners’ Academic Writings: The Case of Iranian Male and Female Post-graduate MA Articles

Expressing doubt and certainty in academic writings requires a cautious use of hedges and boosters. Despite their importance in academic writing, little is known about how they are used in monolingual and bilingual male and female EFL learners’ academic writings. To shed some lights on the issue, the present study investigated the use of hedges and boosters in research articles written by monol...

متن کامل

Metalinguistic Awareness and Bilingual vs. Monolingual EFL Learners: Evidence from a Diagonal Bilingual Context

This paper reports a study of 85 Iranian EFL learners in the English Language Department of Urmia University. It explores the possible differences between performance of 38 Persian monolingual and 47 Turkish-Persian bilingual EFL learners on metalinguistic tasks of ungrammatical structures and translation. The underlying hypothesis is that bilinguals in diagonal bilingual contexts experience a ...

متن کامل

Inducing Bilingual Lexicons from Small Quantities of Sentence-Aligned Phonemic Transcriptions

We investigate induction of a bilingual lexicon from a corpus of phonemic transcriptions that have been sentence-aligned with English translations. We evaluate existing models that have been used for this purpose, and report two additional models which demonstrate performance improvements. The first performs monolingual segmentation followed by alignment, while the second performs both tasks jo...

متن کامل

Bilingual Segmentation for Alignment and Translation

We propose a method that bilingually segments sentences in languages with no clear delimiter for word boundaries. In our model, we first convert the search for the segmentation into a sequential tagging problem, allowing for a polynomial-time dynamic-programming solution, and incorporate a control to balance monolingual and bilingual information at hand. Our bilingual segmentation algorithm, th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012